In the field of molecular biology, a transcription factor (sometimes called a sequence-specific DNA binding factor) is a protein that binds to specific DNA sequences and thereby controls the transfer (or transcription) of genetic information from DNA to mRNA.[1][2] Transcription factors perform this function alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase (the enzyme that performs the transcription of genetic information from DNA to RNA) to specific genes.[3][4][5]
A defining feature of transcription factors is that they contain one or more DNA-binding domains (DBDs), which attach to specific sequences of DNA adjacent to the genes that they regulate.[6][7] Additional proteins such as coactivators, chromatin remodelers, histone acetylases, deacetylases, kinases, and methylases, while also playing crucial roles in gene regulation, lack DNA-binding domains, and therefore are not classified as transcription factors.[8]
Transcription factor glossary |
---|
• transcription – copying of DNA by RNA polymerase into messenger RNA |
• factor – a substance, such as a protein, that contributes to the cause of a specific biochemical reaction or bodily process |
• transcriptional regulation – controlling the rate of gene transcription for example by helping or hindering RNA polymerase binding to DNA |
• upregulation, activation, or promotion – increase the rate of gene transcription |
• downregulation, repression, or suppression – decrease the rate of gene transcription |
• coactivator – a protein that works with transcription factors to increase the rate of gene transcription |
• corepressor – a protein that works with transcription factors to decrease the rate of gene transcription |
edit |
Contents |
Transcription factors are essential for the regulation of gene expression and are, as a consequence, found in all living organisms. The number of transcription factors found within an organism increases with genome size, and larger genomes tend to have more transcription factors per gene.[9]
There are approximately 2600 proteins in the human genome that contain DNA-binding domains, and most of these are presumed to function as transcription factors.[10] Therefore, approximately 10% of genes in the genome code for transcription factors, which makes this family the single largest family of human proteins. Furthermore, genes are often flanked by several binding sites for distinct transcription factors, and efficient expression of each of these genes requires the cooperative action of several different transcription factors (see, for example, hepatocyte nuclear factors). Hence, the combinatorial use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development.[8]
Transcription factors bind to either enhancer or promoter regions of DNA adjacent to the genes that they regulate. Depending on the transcription factor, the transcription of the adjacent gene is either up- or down-regulated. Transcription factors use a variety of mechanisms for the regulation of gene expression.[11] These mechanisms include:
Transcription factors are one of the groups of proteins that read and interpret the genetic "blueprint" in the DNA. They bind DNA and help initiate a program of increased or decreased gene transcription. As such, they are vital for many important cellular processes. Below are some of the important functions and biological roles transcription factors are involved in:
In eukaryotes, an important class of transcription factors called general transcription factors (GTFs) are necessary for transcription to occur.[14][15][16] Many of these GTFs don't actually bind DNA but are part of the large transcription preinitiation complex that interacts with RNA polymerase directly. The most common GTFs are TFIIA, TFIIB, TFIID (see also TATA binding protein), TFIIE, TFIIF, and TFIIH.[17] The preinitiation complex binds to promoter regions of DNA upstream to the gene that they regulate.
Other transcription factors differentially regulate the expression of various genes by binding to enhancer regions of DNA adjacent to regulated genes. These transcription factors are critical to making sure that genes are expressed in the right cell at the right time and in the right amount depending on the changing requirements of the organism.
Many transcription factors in multicellular organisms are involved in development.[18] Responding to cues (stimuli), these transcription factors turn on/off the transcription of the appropriate genes, which, in turn, allows for changes in cell morphology or activities needed for cell fate determination and cellular differentiation. The Hox transcription factor family, for example, is important for proper body pattern formation in organisms as diverse as fruit flies to humans.[19][20] Another example is the transcription factor encoded by the Sex-determining Region Y (SRY) gene, which plays a major role in determining gender in humans.[21]
Cells can communicate with each other by releasing molecules that produce signaling cascades within another receptive cell. If the signal requires upregulation or downregulation of genes in the recipient cell, often transcription factors will be downstream in the signaling cascade.[22] Estrogen signaling is an example of a fairly short signaling cascade that involves the estrogen receptor transcription factor: estrogen is secreted by tissues such as the ovaries and placenta, crosses the cell membrane of the recipient cell, and is bound by the estrogen receptor in the cell's cytoplasm. The estrogen receptor then goes to the cell's nucleus and binds to its DNA-binding sites, changing the transcriptional regulation of the associated genes.[23]
Not only do transcription factors act downstream of signaling cascades related to biological stimuli but they can also be downstream of signaling cascades involved in environmental stimuli. Examples include heat shock factor (HSF), which upregulates genes necessary for survival at higher temperatures,[24] hypoxia inducible factor (HIF), which upregulates genes necessary for cell survival in low-oxygen environments,[25] and sterol regulatory element binding protein (SREBP), which helps maintain proper lipid levels in the cell.[26]
Many transcription factors, especially some that are oncogenes or tumor suppressors, help regulate the cell cycle and as such determine how large a cell will get and when it can divide into two daughter cells.[27][28] One example is the Myc oncogene, which has important roles in cell growth and apoptosis.[29]
It is common in biology for important processes to have multiple layers of regulation and control. This is also true with transcription factors: not only do transcription factors control the rates of transcription to regulate the amounts of gene products (RNA and protein) available to the cell, but transcription factors themselves are regulated (often by other transcription factors). Below is a brief synopsis of some of the ways that the activity of transcription factors can be regulated:
Transcription factors (like all proteins) are transcribed from a gene on a chromosome into RNA, and then the RNA is translated into protein. Any of these steps can be regulated to affect the production (and thus activity) of a transcription factor. One interesting implication of this is that transcription factors can regulate themselves. For example, in a negative feedback loop, the transcription factor acts as its own repressor: if the transcription factor protein binds the DNA of its own gene, it will down-regulate the production of more of itself. This is one mechanism to maintain low levels of a transcription factor in a cell.
In eukaryotes, transcription factors (like most proteins) are transcribed in the nucleus but are then translated in the cell's cytoplasm. Many proteins that are active in the nucleus contain nuclear localization signals that direct them to the nucleus. But, for many transcription factors, this is a key point in their regulation.[30] Important classes of transcription factors such as some nuclear receptors must first bind a ligand while in the cytoplasm before they can relocate to the nucleus.[30]
Transcription factors may be activated (or deactivated) through their signal-sensing domain by a number of mechanisms including:
In eukaryotes, genes that are not being actively transcribed are often located in heterochromatin. Heterochromatin are regions of chromosomes that are heavily compacted by tightly bundling the DNA onto histones and then organizing the histones into compact chromatin fibers. DNA within heterochromatin is inaccessible to many transcription factors. For the transcription factor to bind to its DNA-binding site, the heterochromatin must first be converted to euchromatin, usually via histone modifications. A transcription factor's DNA-binding site may also be inaccessible if the site is already occupied by another transcription factor. Pairs of transcription factors can play antagonistic roles (activator versus repressor) in the regulation of the same gene.
Most transcription factors do not work alone. Often for gene transcription to occur, a number of transcription factors must bind to DNA regulatory sequences. This collection of transcription factors in turn recruit intermediary proteins such as cofactors that allow efficient recruitment of the preinitiation complex and RNA polymerase. Thus, for a single transcription factor to initiate transcription, all of these other proteins must also be present, and the transcription factor must be in a state where it can bind to them if necessary.
Transcription factors are modular in structure and contain the following domains:[1]
Trans-activating domains (TADs) are named after their amino acid composition. These amino acids are either essential for the activity or simply the most abundant in the TAD. Transactivation by the Gal4 transcription factor is mediated by acidic amino acids whereas hydrophobic residues in Gcn4 play a similar role. Hence the TADs in Gal4 and Gcn4 are referred to as acidic or hydrophobic activation domains respectively.[34]
Nine-amino-acid transactivation domain (9aaTAD) defines a novel domain common to a large superfamily of eukaryotic transcription factors represented by Gal4, Oaf1, Leu3, Rtg3, Pho4, Gln3, Gcn4 in yeast and by p53, NFAT, NF-κB and VP16 in mammals.[35] Prediction for 9aa TADs (for both acidic and hydrophilic transactivation domains) is available online from ExPASy [36] and EMBnet Spain [37]
9aaTAD transcription factors p53, VP16, MLL, E2A, HSF1, NF-IL6, NFAT1 and NF-κB interact directly with the general coactivators TAF9 and CBP/p300.[38] p53 9aaTADs interact with TAF9, GCN5 and with multiple domains of CBP/p300 (KIX, TAZ1,TAZ2 and IBiD).[39]
KIX domain of general coactivators Med15(Gal11) interacts with 9aaTAD transcription factors Gal4, Pdr1, Oaf1, Gcn4, VP16, Pho4, Msn2, Ino2 and P201.[40] Interactions of Gal4, Pdr1 and Gcn4 with Taf9 were reported. [41] 9aaTAD is a common transactivation domain recruits multiple general coactivators TAF9, MED15, CBP/p300 and GCN5.[42]
The portion (domain) of the transcription factor that binds DNA is called its DNA-binding domain. Below is a partial list of some of the major families of DNA-binding domains/transcription factors:
Family | InterPro | Pfam | SCOP |
---|---|---|---|
basic-helix-loop-helix[43] | IPR001092 | Pfam PF00010 | SCOP 47460 |
basic-leucine zipper (bZIP)[44] | IPR004827 | Pfam PF00170 | SCOP 57959 |
C-terminal effector domain of the bipartite response regulators | IPR001789 | Pfam PF00072 | SCOP 46894 |
GCC box | SCOP 54175 | ||
helix-turn-helix[45] | |||
homeodomain proteins - bind to homeobox DNA sequences, which in turn encode other transcription factors. Homeodomain proteins play critical roles in the regulation of development.[46] | IPR009057 | Pfam PF00046 | SCOP 46689 |
lambda repressor-like | IPR010982 | SCOP 47413 | |
srf-like (serum response factor) | IPR002100 | Pfam PF00319 | SCOP 55455 |
paired box[47] | |||
winged helix | IPR013196 | Pfam PF08279 | SCOP 46785 |
zinc fingers[48] | |||
* multi-domain Cys2His2 zinc fingers[49] | IPR007087 | Pfam PF00096 | SCOP 57667 |
* Zn2/Cys6 | SCOP 57701 | ||
* Zn2/Cys8 nuclear receptor zinc finger | IPR001628 | Pfam PF00105 | SCOP 57716 |
The DNA sequence that a transcription factor binds to is called a transcription factor-binding site or response element.[50]
Transcription factors interact with their binding sites using a combination of electrostatic (of which hydrogen bonds are a special case) and Van der Waals forces. Due to the nature of these chemical interactions, most transcription factors bind DNA in a sequence specific manner. However, not all bases in the transcription factor-binding site may actually interact with the transcription factor. In addition, some of these interactions may be weaker than others. Thus, transcription factors do not bind just one sequence but are capable of binding a subset of closely related sequences, each with a different strength of interaction.
For example, although the consensus binding site for the TATA-binding protein (TBP) is TATAAAA, the TBP transcription factor can also bind similar sequences such as TATATAT or TATATAA.
Because transcription factors can bind a set of related sequences and these sequences tend to be short, potential transcription factor binding sites can occur by chance if the DNA sequence is long enough. It is unlikely, however, that a transcription factor binds all compatible sequences in the genome of the cell. Other constraints, such as DNA accessibility in the cell or availability of cofactors may also help dictate where a transcription factor will actually bind. Thus, given the genome sequence it is still difficult to predict where a transcription factor will actually bind in a living cell.
Additional recognition specificity, however, may be obtained through the use of more than one DNA-binding domain (for example tandem DBDs in the same transcription factor or through dimerization of two transcription factors) that bind to two or more adjacent sequences of DNA.
Transcription factors are of clinical significance for at least two reasons: (1) mutations can be associated with specific diseases, and (2) they can be targets of medications.
Due to their important roles in development, intercellular signaling, and cell cycle, some human diseases have been associated with mutations in transcription factors.[51]
Cancer Many transcription factors are either tumor suppressors or oncogenes, and, thus, mutations or aberrant regulation of them is associated with cancer. Three groups of transcription factors are known to be important in human cancer : 1) the NF-kappaB and AP-1 families, 2) the STAT family and 3) the steroids receptors.[52].
Below are a few of the more well-studied examples:
Condition | Description | Locus |
---|---|---|
Rett syndrome | Mutations in the MECP2 transcription factor are associated with Rett syndrome, a neurodevelopmental disorder.[53][54] | Xq28 |
Diabetes | A rare form of diabetes called MODY (Maturity onset diabetes of the young) can be caused by mutations in hepatocyte nuclear factors (HNFs)[55] or insulin promoter factor-1 (IPF1/Pdx1).[56] | multiple |
Developmental verbal dyspraxia | Mutations in the FOXP2 transcription factor are associated with developmental verbal dyspraxia, a disease in which individuals are unable to produce the finely coordinated movements required for speech.[57] | 7q31 |
Autoimmune diseases | Mutations in the FOXP3 transcription factor cause a rare form of autoimmune disease called IPEX.[58] | Xp11.23-q13.3 |
Li-Fraumeni syndrome | Caused by mutations in the tumor suppressor p53.[59] | 17p13.1 |
breast cancer | The STAT family is relevant to breast cancer.[60] | multiple |
multiple cancers | The HOX family are involved in a variety of cancers.[61] | multiple |
Approximately 10% of currently prescribed drugs directly target the nuclear receptor class of transcription factors.[62] Examples include tamoxifen and bicalutamide for the treatment of breast and prostate cancer, respectively, and various types of anti-inflammatory and anabolic steroids.[63] In addition, transcription factors are often indirectly modulated by drugs through signaling cascades. It might be possible to directly target other less-explored transcription factors such as NF-κB with drugs.[64][65][66][67] Transcription factors outside the nuclear receptor family are thought to be more difficult to target with small molecule therapeutics since it is not clear that they are "drugable" but progress has been made on the notch pathway.[68]
There are different technologies available to analyze transcription factors. On the genomic level, DNA-sequencing[69] and database research are commonly used. The protein version of the transcription factor is detectable by using specific antibodies. The sample is detected on a western blot. By using electrophoretic mobility shift assay (EMSA),[70] the activation profile of transcription factors can be detected. A multiplex approach for activation profiling is a TF chip system where several of different transcription factors can be detected in parallel. This technology is based on DNA microarrays, providing the specific DNA-binding sequence for the transcription factor protein on the array surface.[71]
As described in more detail below, transcription factors may be classified by their (1) mechanism of action, (2) regulatory function, or (3) sequence homology (and hence structural similarity) in their DNA-binding domains.
There are three mechanistic classes of transcription factors:
Examples of specific transcription factors[73] | |||
---|---|---|---|
Factor | Structural type | Recognition sequence | Binds as |
SP1 | Zinc finger | 5'-GGGCGG-3' | Monomer |
AP-1 | Basic zipper | 5'-TGA(G/C)TCA-3' | Dimer |
C/EBP | Basic zipper | 5'-ATTGCGCAAT-3' | Dimer |
Heat shock factor | Basic zipper | 5'-XGAAX-3' | Trimer |
ATF/CREB | Basic zipper | 5'-TGACGTCA-3' | Dimer |
c-Myc | Basic-helix-loop-helix | 5'-CACGTG-3' | Dimer |
Oct-1 | Helix-turn-helix | 5'-ATGCAAAT-3' | Monomer |
NF-1 | Novel | 5'-TTGGCXXXXXGCCAA-3' | Dimer |
Transcription factors have been classified according to their regulatory function:[8]
Transcription factors are often classified based on the sequence similarity and hence the tertiary structure of their DNA-binding domains:[74][75][76]
|
|